Importing the libraries

Loading the dataset

Choosing only those columns that's relevant for analysis and visualization

My goal is to help Lending Club team gain insight on what consitutes a 'Good Loans'and what constitutes a 'Bad Loans'. Good Loans are those loans with status of "Current", "Issued" and "Fully Paid" in the loan_status column. Bad Loans are those loans with the status other than "Current", "Issued" and "Fully Paid".

Particularly, I plan to answer the following questions for the Lending Club team:

Hence, the relevant variables for my theme, I believe, are:

id: A unique LC assigned ID for the loan listing.

term: The number of payments on the loan. Values are in months and can be either 36 or 60.

grade: LC assigned loan grade.

purpose: A category provided by the borrower for the loan request.

emp_title: The job title supplied by the Borrower when applying for the loan.

emp_length: Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.

loan_status: Current status of the loan.

Data Cleaning

Check if any of these columns have missing value.

Cleaning emp_length column by removing all the strings and converting employment length to integer. Missing values are filled with -1 value. The rows with the value '< 1 year' is converted to 0 and '10+ years' is converted to 10. This step is mainly done to make it easier to sort the column by years for visualization purpose. After they are sorted, their original name is restored. Finally, renaming it as emp_length_years to make it easier to identify the unit for this column.

Now, we move on to clean emp_title column. Since this column is derived from an open ended question, it is not really a categorical column. However, we could still derive some insights from this column. The missing values will need to be replaced with something like 'employer not mentioned', and we need to lowercase all the datapoints and remove extra whitespaces.

Create loan_type column that categorize each entry either 'Good Loans' or 'Bad Loans'. Loans with status of 'Fully Paid', 'Issued', or 'Current' as given 'Good Loans' value. The rest of the columns are given 'Bad Loans' value.

Saving the final dataframe as a CSV and Excel file

Loading the Cleaned Dataset

Helper Functions for Pie chart

Exploring, analysing and visualizing loan_status variable

Distribution of loan status

Pie chart showing distribution of loan status

Out of 39717 data points, the loan status consist of:

Exploring, analysing and Visualizing loan_type variable

Helper Function for Loan Segmentation

Loan Segment by Term

Generate the segmented df

Plot grouped bar chart based on proportion

Loan Segmentation by Grade

Generate the segmented df

Generate grouped bar chart (total)

Generate grouped bar chart (%)

Employment Duration and Good Loan

Explore emp_title column

Purpose for Bad Loans

Generate df

Generate horizontal bar chart